Goto

Collaborating Authors

 external agent


Deep Inverse Q-learning with Constraints

Neural Information Processing Systems

Popular Maximum Entropy Inverse Reinforcement Learning approaches require the computation of expected state visitation frequencies for the optimal policy under an estimate of the reward function. This usually requires intermediate value estimation in the inner loop of the algorithm, slowing down convergence considerably. In this work, we introduce a novel class of algorithms that only needs to solve the MDP underlying the demonstrated behavior once to recover the expert policy. This is possible through a formulation that exploits a probabilistic behavior assumption for the demonstrations within the structure of Q-learning. We propose Inverse Action-value Iteration which is able to fully recover an underlying reward of an external agent in closed-form analytically. We further provide an accompanying class of sampling-based variants which do not depend on a model of the environment. We show how to extend this class of algorithms to continuous state-spaces via function approximation and how to estimate a corresponding action-value function, leading to a policy as close as possible to the policy of the external agent, while optionally satisfying a list of predefined hard constraints. We evaluate the resulting algorithms called Inverse Action-value Iteration, Inverse Q-learning and Deep Inverse Q-learning on the Objectworld benchmark, showing a speedup of up to several orders of magnitude compared to (Deep) Max-Entropy algorithms. We further apply Deep Constrained Inverse Q-learning on the task of learning autonomous lane-changes in the open-source simulator SUMO achieving competent driving after training on data corresponding to 30 minutes of demonstrations.


ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

Gomaa, Amr, Salem, Ahmed, Abdelnabi, Sahar

arXiv.org Artificial Intelligence

As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between personal assistants and external service providers expose a core tension between utility and protection: effective collaboration requires information sharing, yet every exchange creates new attack surfaces. We introduce ConVerse, a dynamic benchmark for evaluating privacy and security risks in agent-agent interactions. ConVerse spans three practical domains (travel, real estate, insurance) with 12 user personas and over 864 contextually grounded attacks (611 privacy, 253 security). Unlike prior single-agent settings, it models autonomous, multi-turn agent-to-agent conversations where malicious requests are embedded within plausible discourse. Privacy is tested through a three-tier taxonomy assessing abstraction quality, while security attacks target tool use and preference manipulation. Evaluating seven state-of-the-art models reveals persistent vulnerabilities; privacy attacks succeed in up to 88% of cases and security breaches in up to 60%, with stronger models leaking more. By unifying privacy and security within interactive multi-agent contexts, ConVerse reframes safety as an emergent property of communication.


Deep Inverse Q-learning with Constraints

Neural Information Processing Systems

Popular Maximum Entropy Inverse Reinforcement Learning approaches require the computation of expected state visitation frequencies for the optimal policy under an estimate of the reward function. This usually requires intermediate value estimation in the inner loop of the algorithm, slowing down convergence considerably. In this work, we introduce a novel class of algorithms that only needs to solve the MDP underlying the demonstrated behavior once to recover the expert policy. This is possible through a formulation that exploits a probabilistic behavior assumption for the demonstrations within the structure of Q-learning. We propose Inverse Action-value Iteration which is able to fully recover an underlying reward of an external agent in closed-form analytically.


Learning control strategy in soft robotics through a set of configuration spaces

Ménager, Etienne, Duriez, Christian

arXiv.org Artificial Intelligence

The ability of a soft robot to perform specific tasks is determined by its contact configuration, and transitioning between configurations is often necessary to reach a desired position or manipulate an object. Based on this observation, we propose a method for controlling soft robots that involves defining a graph of configuration spaces. Different agents, whether learned or not (convex optimization, expert trajectory, and collision detection), use the structure of the graph to solve the desired task. The graph and the agents are part of the prior knowledge that is intuitively integrated into the learning process. They are used to combine different optimization methods, improve sample efficiency, and provide interpretability. We construct the graph based on the contact configurations and demonstrate its effectiveness through two scenarios, a deformable beam in contact with its environment and a soft manipulator, where it outperforms the baseline in terms of stability, learning speed, and interpretability.


Network control by a constrained external agent as a continuous optimization problem

Nys, Jannes, Heuvel, Milan van den, Schoors, Koen, Merlevede, Bruno

arXiv.org Machine Learning

Social science studies dealing with control in networks typically resort to heuristics or describing the static control distribution. Optimal policies, however, require interventions that optimize control over a socioeconomic network subject to real-world constraints. We integrate optimisation tools from deep-learning with network science into a framework that is able to optimize such interventions in real-world networks. We demonstrate the framework in the context of corporate control, where it allows to characterize the vulnerability of strategically important corporate networks to sensitive takeovers, an important contemporaneous policy challenge. The framework produces insights that are relevant for governing real-world socioeconomic networks, and opens up new research avenues for improving our understanding and control of such complex systems.


Active World Model Learning with Progress Curiosity

Kim, Kuno, Sano, Megumi, De Freitas, Julian, Haber, Nick, Yamins, Daniel

arXiv.org Artificial Intelligence

World models are self-supervised predictive models of how the world evolves. Humans learn world models by curiously exploring their environment, in the process acquiring compact abstractions of high bandwidth sensory inputs, the ability to plan across long temporal horizons, and an understanding of the behavioral patterns of other agents. In this work, we study how to design such a curiosity-driven Active World Model Learning (AWML) system. To do so, we construct a curious agent building world models while visually exploring a 3D physical environment rich with distillations of representative real-world agents. We propose an AWML system driven by $\gamma$-Progress: a scalable and effective learning progress-based curiosity signal. We show that $\gamma$-Progress naturally gives rise to an exploration policy that directs attention to complex but learnable dynamics in a balanced manner, thus overcoming the "white noise problem". As a result, our $\gamma$-Progress-driven controller achieves significantly higher AWML performance than baseline controllers equipped with state-of-the-art exploration strategies such as Random Network Distillation and Model Disagreement.


On Memory Mechanism in Multi-Agent Reinforcement Learning

Zhou, Yilun, Asher, Derrik E., Waytowich, Nicholas R., Shah, Julie A.

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) extends (single-agent) reinforcement learning (RL) by introducing additional agents and (potentially) partial observability of the environment. Consequently, algorithms for solving MARL problems incorporate various extensions beyond traditional RL methods, such as a learned communication protocol between cooperative agents that enables exchange of private information or adaptive modeling of opponents in competitive settings. One popular algorithmic construct is a memory mechanism such that an agent's decisions can depend not only upon the current state but also upon the history of observed states and actions. In this paper, we study how a memory mechanism can be useful in environments with different properties, such as observability, internality and presence of a communication channel. Using both prior work and new experiments, we show that a memory mechanism is helpful when learning agents need to model other agents and/or when communication is constrained in some way; however we must to be cautious of agents achieving effective memoryfulness through other means.


Automatic Vehicle Checking Agent (VCA)

Ahmad, Bashir, Ahmad, Shakeel, Hussain, Shahid, Aslam, Muhammad Zaheer, Abbas, Zafar

arXiv.org Artificial Intelligence

A definition of intelligence is given in terms of performance that can be quantitatively measured. In this study, we have presented a conceptual model of Intelligent Agent System for Automatic Vehicle Checking Agent (VCA). To achieve this goal, we have introduced several kinds of agents that exhibit intelligent features. These are the Management agent, internal agent, External Agent, Watcher agent and Report agent. Metrics and measurements are suggested for evaluating the performance of Automatic Vehicle Checking Agent (VCA). Calibrate data and test facilities are suggested to facilitate the development of intelligent systems.